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The Relativity Concept Inventory: development, analysis and results 
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We report on a concept inventory for special relativity: the development process, data analysis 
methods, and results from an introductory relativity class. The Relativity Concept Inventory tests 
understanding of kinematic relativistic concepts. An unusual feature is confidence testing for each 
question. This can provide additional information; for example high confidence correlated with 
incorrect answers suggests a misconception. A novel aspect of our data analysis is the use of Monte 
Carlo simulations to determine the significance of correlations. This approach is particularly useful 
for small sample sizes, such as ours. Our results include a gender bias that was not present in other 
assessment, similar to that reported for the Force Concept Inventory. 

PACS numbers: 01.40.G-, 01.40.Di, 01.40.Fk, 01.40.gf 



I. INTRODUCTION 

Concept inventories are used to assess learning in many 
areas of physics education When used to determine 
the effectiveness of educational innovations they may 
contribute to the teaching development cycle. Since the 
literature on special relativity education research does 
not include a concept inventory we have developed the 
Relativity Concept Inventory (RCI), available from the 
Supplemental Appendix to this paper. 

Special relativity is interesting in a physics education 
research context because of its combination of deeply 
challenging concepts and simple mathematics. This 
is in contrast with quantum mechanics, which has a 
more complex mathematical structure. Nevertheless, the 
amount of phy sics education research on special relativity 
is smaU [illl. 

The RCI has been validated by feedback from disci- 
pline experts and its validity and reliability established 
by standard methods [ij, [ij- These include the self- 
referential statistics of classical test theory, and bench- 
marking against traditional assessment such as home- 
work and an exam. We have also developed and applied 
Monte Carlo simulation techniques suitable for the anal- 
ysis of correlations in data with small sample size. 

In the next section we describe the process used to de- 
velop the RCI. In section Hill we characterize the students 
the RCI was administered to. In section HVl we describe 
the methods used to analyse the collected data, including 
the use of: item response theory to control for the effect 
of student ability on correlations between questions, and 
Monte Carlo modeling to estimate the statistical signifi- 
cance of correlations. In section|V]we present misconcep- 
tions diagnosed by the RCI and evidence for its gender 
bias. Finally, in the Conclusions, we suggest revisions of 
the RCI. We also argue that understanding the gender 
bias in concept inventories is a significant problem for 



physics education research. 



II. THE DEVELOPMENT PROCESS 

The development of the RCI followed Adams and Wie- 
man [TJ] insofar as our six month project schedule al- 
lowed. In particular, student interviews were not relied 
on as much as suggested by them. The only previous 
attempt to develop a concept inventory for special rela- 
tivity is reported in Gibson's doctoral thesis jl5| . 

We first formulated a list of concepts that captured the 
learning goals of the introductory relativity instruction in 
the Physics 2 course at the The Australian National Uni- 
versity (ANU). These concepts were also informed by rel- 
evant textbooks [3 and the physics education research 
literature. 

Expert feedback on each of fourteen draft concepts of 
introductory relativity was obtained from thirty interna- 



tional respondents |17| using an online survey. Agree- 
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ment with the the appropriateness of the concepts in our 
list ranged from 100% for the first postulate to 50%. Af- 
ter individual consideration, concepts with agreement be- 
low 75% were dropped from the list. The final list of nine 
concepts is given in Table H] 

These concepts were used to develop twenty-four draft 
RCI multiple-choice questions, with one, two or three 
questions primarily addressing each of the concepts. Ex- 
pert feedback on the draft RCI questions was obtained 
from seven respondents using another online survey. In 
addition, a face-to-face interview was conducted with the 
ANU academic teaching advanced special relativity. 

It was then administered to six fourth- year physics stu- 
dents. These students were also asked to write a sen- 
tence or two explaining their reasoning for each question. 
Next, the RCI was taken by three second- year students 
in think aloud format: students were asked to verbalise 
their thinking while answering the RCI questions. These 
students had taken Physics 2 the previous year. These 
sessions were recorded and transcribed for study. 

The RCI was then administered online to the 2012 
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TABLE I: The concepts tested by the RCI. In the questions column are the question number we classified as associated with 
each concept. Although some questions clearly test more than one concept we have allocated each question to only one concept. 



Concept 



Description 



Questions 



First postulate. 
Second postulate. 
Time dilation. 



Length contraction. 

Relativity of simultaneity. 
Inertial reference frame. 
Velocity addition. 
Causality. 

Mass energy equivalence. 



The laws of physics are the same in all inertial reference frames. 

The speed of light in a vacuum is the same in all reference frames. 

The time interval between two time-like separated events is shortest in the 

reference frame for which the two events are at the same position. The time 

between these events is greater in all other frames. 

The length of an object (defined as the space interval between two simultaneous 
events at either end of the object) is the longest in the frame in which the ends 
of the object are at rest, and is shorter in all other frames. 
If two events A and B are space-like separated, then there exist inertial frames 
in which A precedes B, and others in which B precedes A. 
A coordinate system in which a free particle will maintain constant velocity; 
in particular, the concept that all inertial frames are equivalent. 
Velocities transform between frames such that no object can be observed trav- 
elling faster than the speed of light in a vacuum. 

If two events are time-like separated, then the ordering of the events is fixed 
for all reference frames. 
Energy has inertia. 



16, 18, 19 , 20 
3, 4 

5, 6, 7, 8 



13, 14, 17 



11, 12, 15, 21 



1, 2 



9, 10 



22, 23 



24 



ANU Physics 2 class, prior to instruction, as a pre-test, 
and after instruction as a post-test. Neither contributed 
to the course assessment. Students' RCI post-test re- 
sponses were compared to their answers to the relativity 
questions in the Physics 2 mid-course exam, which in- 
cluded short answer conceptual questions. 

All this feedback was used to continuously improve the 
draft RCI. Wording was clarified when found to be am- 
biguous and questions were deleted when it was deter- 
mined they were not adequately addressing desired con- 
cepts. The final version of the RCI is available from 
the Supplemental Appendix to this paper. It consists 
of twenty-four multiple choice questions, with each hav- 
ing a confidence scale. Example questions are given in 
Table HIl Throughout this paper individual questions are 
referred to by their RCI question number. 

RCI questions have an associated confidence scale 
which asks the student to rate how confident they are in 
their answer. One of five options could be selected from 
the online form: guessing, unconfident, neutral, confi- 
dent, and certain. Confidence measures have occasion- 
ally been used before with concept inventories [l8l - l20| . 
including in association with the FCI (2]| . 

Confidence information is potentially useful for gaug- 
ing the quality of students' understanding. For example, 
consider a question that most students answer correctly. 
If they also expressed confidence in their answers this 
would suggest mastery had been achieved. This was the 
case for the pair of questions 3 and 4 concerning the 
constancy of the speed of light. However if students ex- 
pressed less confidence it might indicate memorisation or 
shallow understanding. This was the case for the pair of 
questions 5 and 6 concerning time dilation, see Table HH 

Perhaps more interesting are questions that are an- 
swered incorrectly for which students indicate confidence 
in their answer. This indicates a potential misconcep- 



tion. This was the case for question 7 concerning a twin 
paradox type scenario; see Table HIl 



III. THE STUDENTS 

The RCI data analyzed in the rest of this pap er was 
obtained from the 2012 ANU Physics 2 class [H]. This 
is the second physics course taken by physics majors. 
The class enrolment was niety-nine, from whom seventy 
responses were obtained for the pre-test and sixty-three 
responses for the post-test, with fifty-three individuals 
taking both tests. 

The relativity instruction was a three week module of: 
nine lectures, a three hour simulation laboratory using 
the Real Time Relativity software [2^, and three small- 
group problem-solving tutorials. It was assessed by two 
sets of weekly homework, a pre-lab problem, a lab log- 
book, and a mid-term exam question. The lectures were 
held in a studio space to encourage interaction, and in- 
cluded clicker questions and small group discussion. 

The RCI was administered online in 30 minutes of 
scheduled class time, although those absent from class 
were able to complete it outside of class time. No signif- 
icant differences were found between those two groups. 
All questions were of equal value, with no partial marks 
given. The mean RCI score on the pre-test was 56%, 
and on the post-test 71%. For comparison, the expected 
mean score if answers were chosen randomly is 36%, with 
a standard deviation of about 1% (see section HVB II for 
further explanation). These high scores should be con- 
sidered in the context of the class being high academic 
achievers, as indicated by their median Australian Ter- 
tiary Admission Rank (ATAR) score of 95, out of a pos- 
sible 99.95 [H. 

For our analysis we numerically coded the five confi- 
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TABLE II: Questions 5, 6, 7 and 23 from the RCI. The first 
three test the time dilation concept. The correct answer to 
each is (a). Question 23 tests multiple concepts. The correct 
answer is (d). The full RCI may be found in the supplemental 
appendix. 

In the following two questions, Abbey is m a spaceship mov- 
ing at high speed relative to Brendan, who is standing on 
an asteroid (a very small rock floating in space). She flies 
past him so that at t = 0, she is momentarily adjacent to 
Brendan. 

5. At the instant that Abbey's ship passes Brendan, she 
sends two light pulses to him from her ship. If the light pulses 
are emitted a nanosecond (10~^ seconds) apart according to 
Abbey's clock, what will be the time interval between the 
pulses according to Brendan? 

(a) Greater than one nanosecond 

(b) Equal to one nanosecond 

(c) Less than one nanosecond 



6. Also while Abbey's ship passes Brendan, Brendan sends 
two light pulses to Abbey. If Brendan sends the light pulses 
a nanosecond (10~^ seconds) apart according to his clock, 
what will be the time interval between the pulses according 
to Abbey? 

(a) Greater than one nanosecond 

(b) Equal to one nanosecond 

(c) Less than one nanosecond 



7. It is known that our galaxy is of the order of 100,000 light- 
years in diameter. True or false: "Travelling at a constant 
speed that is less than, but close to, the speed of light, in 
principle it is possible for a person to cross the galaxy within 
their lifetime." 

(a) True 

(b) False. 



23. If two events are separated in such a way that no ob- 
server can be present at both events, which relationship(s) 
are the same for all observers? 

(a) The time between the two events 

(b) The distance between the two events 

(c) The order in which the events occur 

(d) None of these relationships are the same for all observers 



dence options as: guessing (0), unconfident (0.25), neu- 
tral (0.5), confident (0.75) and certain (1). The mean 
confidence over all questions and all students was then 0.5 
for the pre-test and 0.68 for the post-test. The average of 
the Pearson correlation, Eq. between students' con- 
fidence and their score for each question was (r^) — 0.11 
for the pre-test and (r.;) = 0.19 for the post-test. Hence, 
after instruction students not only became more confi- 
dent but were also more likely to answer correctly if they 
expressed confidence. 

Interestingly, although approximately a third of the 
class claimed to have had prior formal instruction in rel- 
ativity at secondary school, those students did not per- 
form better in either the RCI pre or post-tests, or in the 



exam relativity question. 

IV. DATA ANALYSIS METHODS 

In this section we analyse the data obtained from 
administering the RCI to the Physics 2 class. In sec- 
tion IIV Al we use classical test theory to investigate the 
discrimination and consistency of the RCI. In section 
IIVBI wc investigate the correlations between students' 
responses to different RCI questions. 

As our sample size is small we paid particular atten- 
tion to the statistical significance of correlations. Where 
possible, we calculated the probability that the observed 
correlations might arise by chance from sampling noise 
rather than from actual properties of the underlying pop- 
ulation: so called p- values. In the language of physics and 
engineering, we attempted to distinguish the signal from 
the noise |25| . 

In the case of approximately normally distributed data 
this was done using standard deviations from the mean. 
Otherwise, we used either the Kolmogorov-Smirnov test 
[26| . or Monte Carlo simulations, to calculate the prob- 
ability that the correlation could have arisen by chance. 
The Kolmogorov-Smirnov test is preferred over the chi- 
squared test for small sample sizes (27| . 



A. Classical test theory 

Classical test theory provides a set of statistics for esti- 
mating the discrimination and consistency of a test. Dis- 
crimination is the capability to quantify students' un- 
derstanding of the subject of the inventory. Consistency 
is the extent to which each question is measuring the 
same broad understanding. Overviews have been given 
by Ding et al. [l^, and Ding and Beichner [29j . 

Table IIIII reports some test statistics for the RCI 
post-test. The desired ranges are boundaries, according 
to Ding and Beichner [29| . beyond which consideration 
should be given to possible problems with the inventory. 
The item difficulty of question number i is the fraction 
of correct answers. Pi = Nconcct/Ni, where Ni is the to- 
tal number of answers to the question. Figure [T] shows 
the item difficulties for each question. The post-test RCI 
item difficulty averaged over all questions, (P) = 0.71, 
tells us that the test was rather easy. However, as noted 
in the previous section, the class was particularly accom- 
plished. For those questions that did not change between 
the pre-test and post-test, Fig.[T]shows the pre-test item 
difficulties and the normalised gain. The normalised gain 
for a question is defined to be the change in item diffi- 
culty divided by the maximum possible change in item 
difficulty, gi = (P^^post - Pi,prc)/ (1 - Pi.pre) [lO]. It is the 
fraction of the possible improvement that was achieved 
following instruction. The RCI normalised gain averaged 
over all questions was (g) = 0.40. The Kolmogorov- 
Smirnov test [l^ determined that the probability that 
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TABLE III: RCI post-test statistics. Sample size TV = 63 
students. The desired ranges are those suggested by Ding 
and Beichner [2911. 



Statistic 



RCI value Desired range 



Mean item difficulty 0.71 [0.3,0.9] 

Mean discrimination index 0.24 > 0.3 

Ferguson's delta 0.96 > 0.9 

Mean point biserial coefficient 0.36 > 0.2 

KR20 rehability 0.74 > 0.7 



the pre and post-test results were sampled from the same 
population was p = 4 x 10^^. Hence we conclude that 
the normalised gain is statistically significant. 

The only RCI statistic in Table IIIII falling outside the 
desired range is the mean discrimination index. This 
compares the number of students whose total RCI re- 
sults were in the top quartile to those in the bot- 
tom quartile. The discrimination index for a question 
takes the difference between the fraction of correct an- 
swers to that question from students in the top quar- 
tile Ni^T and from those in the bottom quartile A^^^^b: 
= N^^T/{0.25N^)-N^.B/i0.25Ni). The mean discrim- 
ination index is the mean of the discrimination indices 
for all questions. The low RCI value in Table Hill is par- 
tially due to the ease of the RCI, discussed in section Hill 
which reduces discrimination because the difference in 
student performance between the top and bottom quar- 
tiles is less than for a difficult test. Questions 12, 13, 
14, 20 and 24 had discrimination indices Di < 0. Their 
range of item difficulties was 0.52 > Pi > 0.98 with a 
mean of 0.85. These questions should be reconsidered 
in any RCI revisions. Indeed, in section IIV B 21 we rec- 
ommend dropping question 24, concerning mass-energy 
equivalence. Hence, the low mean discrimination index 
suggests how the RCI might be improved. Nevertheless, 
we next show that another measure of discrimination, 
Ferguson's delta, is within the acceptable range. 

Ferguson's delta measures how the actual total scores 
are distributed in comparison to the possible range of 
scores. If only one particular score was ever achieved then 
5 — 0, while if all possible scores are achieved equally of- 
ten (5 ?a 1. Thus Ferguson's delta measures the ability of 
the RCI to discriminate between students' understand- 
ing. It is defined to be [2^ 



5 = 



iV2-iV2/(A' + l)' 



(1) 



where fi is the number of times the total score was i. In 
contrast to the discrimination index, the RCI Ferguson's 
delta of (5 = 0.96 indicates that the RCI has adequate dis- 
crimination. We conclude that while the discrimination 
of the RCI might be improved, it is adequate. 

The Pearson correlation between random variables X 
and Y is defined to be their covariance divided by the 
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Question number 

FIG. 1: (colour online) RCI results by question for the Physics 
2 class: the post-instruction item difficulties (blue +), pre- 
instruction item difficulties (black x), and the normalised 
gain (red o). The sample sizes were 63 for the post-test and 
70 for the pre-test, with 53 individuals doing both tests. The 
question number ordering is by post-instruction item diffi- 
culty. Questions 18, 19 and 21 have no pre-test item diffi- 
culties or normalised gains as they were changed between the 
pre and post-tests. The actual post-test questions are given 
in the Supplemental Appendix. The normalised gain is cal- 
culated for the students who took both the pre-test and the 
post-test. Hence it cannot be calculated using the plotted pre 
and post scores, as they include additional students. 



product of their standard deviations: 



rxY 



Cov(X, Y) 
^Var(A:)Var(y) ' 



(2) 



where Cov{X,Y) = {{X - {X)){Y - (Y))) and Var(X) = 
{{X — (X))2). In classical test theory the point biserial 
coefficient for a question is the Pearson correlation be- 
tween its item score and the total score for the inventory. 
Treating question answers as dichotomous variables, be- 
ing right or wrong, the point biserial coefficient for ques- 
tion number i can be expressed as p9| 



^pbc 



{{Xr,^) - (X^,,))/f^(r~f^/ax, (3) 



where (Xr.i) is the mean total score for those who got the 
question right, {Xy^^i) is the mean total score for those 
who got the question wrong, and ax is the standard de- 
viation of the total score. The RCI mean point biserial 
coefficient over all post-test questions of (^pbc) = 0.36 
tells us that the RCI questions are consistent in what 
they measure. 

The KR20 reliability statistic is another measure of the 
internal consistency of the inventory. It estimates the de- 
gree of correlation between the answers to questions. A 
value near one indicates that all questions are testing the 
same thing, while a value near zero indicates that the an- 
swers are independent of each other. A value too close to 
one would be undesirable for the RCI, since it is intended 
to test a number of different concepts. However, as usual 
in physics, the concepts are interrelated, so that a deep 
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understanding of relativity requires an understanding of 
all concepts; so a low value is also undesirable. The KR20 
reliability statistic is defined to be [1^ 



riiL 



rKR20 = (<^X-T. ^P^i^-P^ ) /^X , (4) 

1=1 

where iiT = 24 is the number of questions in the inventory. 
The RCI reliability statistic of rKR20 = 0.74 agrees with 
the mean point biserial coefBcicnt that the RCI questions 
are consistent in what they measure. 



K 



B. Question correlations 

Correlations between students' responses to different 
questions can provide information on the reliability of 
the Inventory. They can also provide information about 
students' understanding, as we will show in section fV Al 

As usual in statistical analysis, we assume that our 
sample, the Physics 2 class, is a subset of a larger popu- 
lation that we want to understand. This might be all stu- 
dents who have taken, or will take, a similar course. We 
assume that our sample of students is randomly chosen 
from the larger population and that its statistics estimate 
those of the larger population. However, in the partic- 
ular sample, correlations can arise by chance even when 
no underlying correlation exists. Hence it is important to 
calculate the statistical significance of correlations, espe- 
cially with small sample sizes, such as ours. This tells us 
the probability that we might be misled by sample noise, 
and hence informs any action that might be taken based 
on the statistical evidence. 

For example, with twenty-four questions in the RCI 
there are (24 x 23)/2 = 276 possible correlations between 
question pairs. These are shown in Fig. [2l as calculated 
from the post-test data. To understand why this should 
alter our choice of statistical significance threshold, as- 
sume there was a hypothetical 5% chance of correlations 
above a certain strength occurring between any particu- 
lar question pair, entirely due to random variation in the 
data. Then we would expect to find about 276 x 0.05 w 14 
so correlated question pairs by chance. Choosing an ac- 
ceptance threshold ofp< l/276 «4x 10"'^ ensures that 
in the long run less than one correlation is accepted due 
to sampling noise alone. Such care is required whenever 
there are many noisy channels in which a signal is being 
sought. However, it comes at the cost of an increased 
likelihood of missing correlations that in fact exist in the 
larger population. 

A related problem is determining the significance of 
the absence of expected correlations. For example, con- 
sider two questions that were designed to test the same 
concept, but that are not significantly correlated accord- 
ing to the student data. What strength of correlation 
can the data reliably rule out? 

We have addressed such questions using Monte Carlo 
simulation. As this approach is not common in physics 
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FIG. 2: (colour online) Histogram of the Pearson correlations 
between all 276 question pairs from the post-test data. The 
correlations are calculated using Eq. (|6]), with the pxY derived 
from the data. The mean correlation is (r) = 0.1 and the 
standard deviation is 0.15. 



education research, we describe it in some detail in the 
next section. 



1. Monte Carlo simulation 

Our Monte Carlo simulations are based on stochastic 
models of the student population. Random samples are 
drawn from the model and their distributions used to es- 
timate statistical significance. As models are simplified 
descriptions of students' responses, such estimates must 
be treated with care. Nevertheless, they help quantify 
the degree to which correlations in the data imply corre- 
lations in the larger population. 

An example, concerning means rather than correla- 
tions, was given in section IIIII The standard deviation 
in randomly answered mean scores was estimated from a 
model in which the answer to each question was chosen 
with uniform probability. The mean scores of samples 
of size = 70 were approximately normally distributed 
with a mean of 36% and a standard deviation of about 
1%. Since the pre-test mean of 56% is then about twenty 
standard deviations from the mean, we can conclude that 
the students are not guessing their answers. 

More interesting is the estimation of the statistical sig- 
nificance of correlations between two questions. Let us 
call them Ql and Q2. We code the question answers as 
correct (1) or incorrect (0). There are then four possible 
answers to the two questions: both correct, both incor- 
rect, only Ql correct, and only Q2 correct. Our model of 
the larger student population assumes that students' an- 
swers follow the multinomial distribution over these four 
possible outcomes. 

Let pii be the probability that both questions are an- 
swered correctly, poo the probability that both are an- 
swered incorrectly, pio the probability that only Ql is 
answered correctly, and poi the probability that only Q2 
is answered correctly. The multinomial probability func- 
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TABLE IV: Post-test correlations between questions statisti- 
cally significant at the p < 10^^ level. The Pearson corre- 
lation is calculated using Eq. [6] The p- values were obtained 
from 20, 000 Monte Carlo samples for each question pair with 
zero correlations between questions. 



Questions 



Pearson's r 



p- value 



11, 12 
3, 9 

15, 22 
2, 7 

9, 22 



0.56 
0.56 
0.44 
0.43 
0.44 
0.39 
0.38 



< 5 X 10"' 

< 5 X IQ-^ 

4 X 10-" 
3 X 10"'' 

5 X 10"" 
7 X 10"'* 
9 X 10"" 



tion is then 127 



Pt{Nu,Noo,Nw,Noi) 



Nl 



^ Pii Poo Pio Poi ' 



(5) 



where Nxy is the number of XY outcomes from a sam- 
ple of N answers. Three equations, in addition to the 
normalization, pn +poo + Pio +Poi = 1, specify the dis- 
tribution. We take these to be the probability of a correct 
answer to Ql, Pi = pn +pio, the probability of a correct 
answer to Q2, P2 = Pii+Poi, and the Pearson correlation 
between the answers to Ql and Q2, 



ri2 



PiiPoo - PioPoi 



+Pio)(pii +poi)(Poo +Pio)(Poo +P01) 



(6) 



Hence specifying Pi, P2, and ri2 determines the distri- 
bution. The first two are the item difficulties from the 
student data. In contrast, the correlation is chosen to test 
a significance hypothesis. For example, say the student 
data has a correlation of C, and we want to know whether 
this is significant. We then choose the model correlation 
to be ri2 = 0. Taking Monte Carlo samples from the 
model |3l| we can determine the probability that corre- 
lations equal to or larger than the observed correlation C 
arise from the model with zero correlation. If this prob- 
ability is p we would say that the observed correlation is 
statistically significant at the p level. 

Monte Carlo significance testing of our post-test data 
found the seven correlations shown in Table HVl to be sig- 
nificant at the p < 10~^ level. From the argument at the 
beginning of section flVBI these are unlikely to arise ran- 
domly. The first three are expected correlations between 
conceptually related questions. However the others are 
unexpected. In the next section we explain the observed 
correlations between these conceptually unrelated ques- 
tions using item response theory. 

It is surprising that Table IIVI does not contain 
more correlations between conceptually related ques- 
tions. However, the fact that an observed correlation is 
not statistically significant does not, in itself, justify the 



conclusion that there is no correlation in the larger pop- 
ulation. As far as the data alone is concerned, it leaves 
us uncertain either way. 

One way of dealing with this problem is based on 
Bayes' theorem j2^. In our context, this approach as- 
signs prior probabilities to correlations. These probabil- 
ities are then adjusted according to the statistical evi- 
dence from the data. This has the advantage that cor- 
relations that we have prior reason to believe exist, for 
example between conceptually related RCI questions, are 
less likely to be rejected as noise than do correlations that 
we have no prior reason to believe exist. Although we 
will not use quantitative Bayesian statistics, the Bayesian 
framework helps explain the lack of expected correlations 
in Table IIVI as it takes no account of prior information. 

Alternatively, further Monte Carlo simulations might 
show that sufficiently strong correlation values are un- 
likely. In cases for which we expected a correlation, this 
would justify a reconsideration of our reasons for that 
expectation. For example, we could select an assumed 
strong correlation Ca and set the model correlation equal 
to it, ri2 = Ca- From Monte Carlo simulations we could 
then determine the probability p that the simulated cor- 
relations are equal to or less than the observed correlation 
C, even though the model correlation is Ca ■ If this prob- 
ability is sufficiently small we may rule out the assumed 
correlation at the p level. 



2. Item response theory 

It is reasonable to assume that a major determinant of 
whether a student answers a question correctly is their 
academic ability. Given a question pair, strong students 
will tend to get both right, and weak students will tend 
to get both wrong, strengthening the overall correlations. 
If this assumption is correct, then removing that part 
of students' performance due to academic ability may 
increase the correlations due to conceptual relatio ns 1331 . 
This may be achieved using item response theory |29[. 

Item response theory, sometimes called Rasch analysis 
[33j . assumes that there is one parameter that describes 
the performance of student number j, their ability 9j, 
and one parameter, bi, that describes the difSculty of 
question number i. These are generated by a logistic 
regression algorithm [13] from the student data to pro- 
vide a maximum likelihood estimate for the probability 
of student j getting question i correct from the model 



P. 



(7) 



Let Mij be the actual response of student j to question 
i, coded so 1 is correct and incorrect. The residuals 
Rij = Mij — Pij measure the deviation of the particular 
student j and question i from the population of students 
and questions with the same respective ability and diffi- 
culty. According to item response theory these residuals 
have the student ability and question difficulty factors 
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TABLE V: Item response theory residual correlations dk, 
statistically significant at the three-sigma level, from the post- 
test data. The rightmost column is how many standard devi- 
ations dk is from the mean. 



Questions dk cr 

576 008 417 

1, 2 0.066 3.4 

11, 12 0.066 3.4 

7, 8 -0.083 3.6 

23, 24 -0.086 3.7 



removed. Hence correlations between the residuals are 
due to factors other than student's ability and question 
difficulty. 

We therefore calculated the correlations between the 
residuals for each question pair, averaged over all N stu- 
dents 

1 ^ 

Cifc = ^ ^RijRk]- (8) 

These correlations were found to be approximately nor- 
mally distributed with mean zero and standard deviation 
0.02. We consider the statistically significant correlations 
to be those that are more than three standard devia- 
tions from the mean, that is, with a one-sided p-value of 
< 2 X 10-3. Table El lists these. 

The three positively correlated questions are precisely 
the conceptually related pairs in the raw scores correla- 
tion Table IIVI All the other correlations in Table IIVI are 
absent. Hence student ability, as modelled by item re- 
sponse theory, explains the correlations between the raw 
scores of conceptually unrelated questions. 

The last two rows in Table |V] are anti-correlations, 
with one-sided p- values of ~ 3 x lO""'. The first anti- 
correlation is surprising as both questions 7 and 8 were 
designed to test the concept of time dilation, and hence 
were expected to be positively correlated. However, as 
we shall see in section IVXl question 7 (see Table lll| is 
unusual in being one of the two questions having an anti- 
correlation with confidence. 

There is no obvious relation between the second anti- 
correlated pair, questions 23 (causality) and 24 (mass- 
energy). However, question 24 is unusual in being the 
only question with a negative normalised gain, as can be 
seen in Fig. [T] Hence we recommend that question 24 be 
dropped from the RCI. 



3. Factor analysis 

Factor analysis attempts to model students' answers in 
terms of a small number T of factors, also called latent 
traits, with T < K, the number of questions. In the 
ideal RCI case these factors would correspond to the nine 
concepts in Table U used to design the questions. Factor 



analysis reproduces the observed data, as accurately as 
possible, with a linear model of the form [35| : 

T 

My ^P,+Y^ a,kFjk + u^Y.j , (9) 
fc=i 

where Mij is the response of student j to question i, intro- 
duced following Eq. ([T]). The Pi are the item difficulties 
for each question. The aik are called the factor loadings. 
The last term, UiYij, is the residual error unique to each 
question. The Fij and Yij are independent, normally 
distributed, random variables with zero mean and unit 
variance. They represent the underlying larger popula- 
tion from which the data was sampled. Averaging over 
this population one finds that the factor loadings deter- 
mine the correlations between questions. Determining 
these is the primary objective of factor analysis. 

The applicability of factor analysis to small sample 
sizes is controversial. A commonly stated criterion is that 
meaningful factor analysis requires ten times as many re- 
sponses as questions [ij, HI] . According to this criterion, 
factor analysis of our data set would not be valid, as we 
have less than three times as many responses as ques- 
tions K. 

However Monte Carlo studies have identified more 
complex criteria that may justify factor analysis of 
smaller data sets [37l-l39|. Sample sizes as small as ours, 

= 63, may be acceptable if the following three things 
are all sufficiently high: the number of questions, the 
ratio of the number of questions to the number of fac- 
tors [soj . and the factor communalities (37j . Communal- 
ities measure how much of a variable's variance is due to 
the factor loadings, with a sufficiently high communality 
in this context being > 0.6. The average communality 
for our post-test questions is 0.74 A caveat is that 
these studies considered continuous data, not binary data 
like ours. Nevertheless, these studies suggest that under 
certain conditions a factor analysis of our data may be 
meaningful, despite the small sample size. 

Figure [3] shows scree plots of the eigenvalues of the 
question pair correlation matrices. Factor analysis folk 
lore says that the number of significant factors is the 
number of eigenvalues on the initial steep slope before 
the transition to a constant smaller slope. From Fig. [3] 
this is four for the post-test data, two for the pre-test data 
and none for the random data. As mentioned, such low 
numbers of factors are necessary for the self consistency 
of our factor analysis [H, H^. The random data was 
generated by a Monte Carlo sampling of all individual 
question answers with equal probability. It was included 
as a consistency check that should show no significant 
factors. 

The first four factors for the post-test data have pairs 
of dominant factor loadings corresponding to the concep- 
tually related pairs in Tables ITVl and RT correlations. In 
addition, the third factor is dominated by factor loadings 
for questions 19 and 20 concerning the first postulate con- 
cept. The consistency of the factor analysis results with 
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FIG. 3: (colour online) Scree plots of the eigenvalues resulting 
from factor analyses of the post-test (red x), pre-test (blue 
+), and random (gray o) data versus the corresponding factor 
number. The post-test shows four significant eigenvalues, the 
pre-test two, and the random data none. 



those reported in the previous sections supports its va- 
lidity. 



V. RESULTS 

The previous section focused on statistical methods 
and their application to establishing the consistency and 
reliability of the RCI. In this section the focus is on the 
implications of the RCI results for special relativity ed- 
ucation. We first consider some of the misconceptions 
revealed by the RCI and then show that the RCI is gen- 
der biased. 



A. Misconceptions 

The RCI confidence scale was briefly described in sec- 
tions [n] and mil From the pre-test to the post-test the 
average of the correlation between the score and confi- 
dence for each question increased from (r^) = 0.11 to 
(r-i) = 0.19. Most individual questions in the post-test 
had a positive correlation between confidence and score 
which indicates some mastery of the relevant concepts. 
However, two questions had negative correlations: ques- 
tion 7 (ry — —0.3) and question 23 (r23 = —0.2), signifi- 
cantly different from zero with p < 0.05. These negative 
correlations suggest gaps in students' post-instruction 
mastery. 

Question 7 is given in Table |TT1 It had nearly equal 
numbers of correct and incorrect answers: item difficulty 
Pj = 0.54. Of those students who rated their confidence 
as either certain or confident, nearly equal numbers an- 
swered correctly and incorrectly. This indicates a mis- 
conception about time dilation, which is not captured 
by the other time dilation questions 5, 6, and 8 that 
have positive correlations between confidence and score 



of r = 0.2, 0.25, 0.4, respectively. One difference between 
these questions is that the latter are phrased in terms of 
observations, whereas question 7 is about an experience: 
travelling across the galaxy. It may be that students are 
displaying the misconception that while time dilation ap- 
plies to observations of things, it does not apply to the 
things themselves. 

The other negatively correlated question, 23, is also 
given in Table [III Most of those who answered it correctly 
rated their confidence as either guessing or unconfident. 
Those students may be answering from memorised ma- 
terial, without a firm conceptual understanding. 

Questions 5 and 6 of the RCI, shown in Table |lTl are 
a pair testing understanding of time dilation. They ask 
about the same situation from two different inertial ref- 
erence frames, with each observer measuring the other's 
clock to run slow. Their pre-test item difficulties were 
-fs.pro = 0.63 and Pe.prc = 0.34, the difference being sig- 
nificant at the p ~ 0.05 level. Furthermore their answers 
were anti-correlated, rsg.prc = —0.25, significant at the 
p = 0.02 level. 

Correct relativistic thinking would recognise the sym- 
metry between the two reference frames and hence 
lead to correlation between the answers. However, the 
anti-correlation suggests an asymmetry misconception in 
which A measuring B's clock to run slow implies B mea- 
suring A's clock to run fast. This is related to absolute 
motion misconceptions reg arding Galilean relativity re- 
ported by Panse et al. [4l|. The following student com- 
ment from a Real Time Relativity [2^ lab session on time 
dilation is an example of both the absolute rest frame and 
asymmetry misconceptions: 

"The clocks are stationary, and I'm moving ... so my 
clock is running slow, which is why the clocks are running 
fast compared to mine ..." 

As Tables IIVI and |V] show, the post-test questions 5 
and 6 were the most highly correlated of all pairs, with 
7'56,post = 0.56, significant at the p < 5 x 10~^ level. 
This indicates that relativistic thinking has been achieved 
after instruction, and the asymmetry misconception re- 
duced. The post-test item difficulties were Ps^post = 0.83 
and Pe.post = 0.78, with corresponding normalised gains 
of .95 = 0.54 and ge = 0.67. 

Evidence from class assessment items indicated that 
the asymmetry misconception also occurred for length 
contraction. However, the RCI has no symmetrical pair 
of length contraction questions to test this. Hence we rec- 
ommend that a symmetrical partner question be added 
to the existing RCI length contraction question 13. 



B. Gender Differences 

In the Physics 2 class we found statistically significant 
gender differences in the RCI results. The pre-test was 
taken by 19 females and 51 males, the post-test by 18 
females and 45 males. Of those who took both tests 
15 were female and 38 were male. As shown in Table 
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TABLE VI: RCI statistics by gender for the Pliysics 2 class. 
(P) is the mean item difficulty, (g) is the mean normalised 
gain, (c) is the mean confidence, (a;exam) is the mean exam 
score (fraction of possible score) for the students who did the 
post-test, and (xhv) is the mean homework score (fraction of 
possible score). The ATAR is the university admission score 
discussed in section Hill p- values are the probability that the 
female and male data were sampled from the same population, 
so that the observed difference is due to chance. 



Statistic 


Females 


Males 


p- value 


(-Ppro) 


0.50 


0.58 


0.02 


(-Ppost) 


0.63 


0.72 


0.003 


(9) 


0.23 


0.38 


0.05 


(Cpro> 


0.41 


0.53 


0.02 


(Cpost) 


0.64 


0.70 


0.04 


(^exam ) 


0.66 


0.67 


0.95 


{a;hw> 


0.75 


0.75 


1 


(ATAR) 


94.2 


93.5 


0.96 



IVIi males scored higher than females in: the pre-test, 
post-test, normalised gain, and in confidence. All these 
differences are significant at the p < 0.05 level according 
to the Kolmogorov-Smirnov test. 

By contrast, the gender groups were statistically iden- 
tical for assessable homework and for the mid-term exam 
relativity question. There was also no difference in prior 
achievement as measured by the ATAR score (discussed 
in section ITTl| . 

There were only four individual questions for which the 
gender difference was statistically significant {p < 0.05): 
questions 1 and 2 concerning inertial frames, question 9 
concerning velocity addition, and question 17 concerning 
length contraction. In each of these cases the difference 
in item difficulty between males and females was > 0.27. 
For more than half the questions the magnitude of this 
difference was < 0.1. 

Similar results have been reported for the Force Con- 
cept Inventory (FCI) [43 - l45| and Brief Electricity and 
Magnetism Assessment (BEMA) [iBi- There is a report 
of the FCI gender gap being eliminated by high levels of 
interactive engagement 1471 . although this has not been 
found in other studies [48|. Other inventories have also 
been found to have gender differences [i^ [EOl ■ 

Although some authors have claimed that multiple- 
choice tests are inherently gender biased, the largest stud- 



ies have found no such effect [5l|, [l^ . 



VI. CONCLUSIONS 

Classical test theory suggests that the RCI may be too 
easy and, perhaps consequently, insufhciently discrimi- 
nating. However, we do not recommend revisions, other 
than those suggested below, until data from a wider range 
of students has been analysed. 

In section ITVB 21 we concluded that question 24, con- 
cerning the concept of mass-energy equivalence, should 
be dropped from the RCI. It has zero discrimination, and 
is the only question having a negative normalised gain be- 
tween the pre and post-tests. It was also found to have 
a strong negative correlation with an apparently unre- 
lated question. If dropped, the concept of mass-energy 
equivalence would not be tested by the RCI. 

In section IV Al we concluded that a frame symmetri- 
cal pair of length contraction questions is desirable, mir- 
roring the symmetrical pair of time dilation questions. 
Hence we recommend that a partner question be added 
to the existing RCI length contraction question 13. How- 
ever, any such question would require validation along 
the lines described in sections HIl and ITVl 

The evidence presented in section IV Bl suggests that 
the RCI is gender biased. Previous work has shown sim- 
ilar biases in the Force Concept Inventory and in other 
concept inventories. Concept inventories are useful be- 
cause they can help evaluate innovation and hence im- 
prove teaching. However if their evaluations are biased 
with respect to certain student groups there is a risk that 
improved learning for some comes at the expense of the 
learning of others. It is a task for future physics educa- 
tion research to investigate and understand this interest- 
ing and important problem. 
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Appendix A: Supplemental Appendix: The Relativity Concept Inventory 



This is the version of the RCI that was used in the post-test. 



Instructions: 



• Some of the questions are multiple choice, with an additional confidence scale similar to the example below. For 
each of these questions, circle the answer that you agree most with, and mark on the scale how confident you 
are in your choice. 



• Some of the questions are in the form of statements with which you may agree or disagree. Circle the response 
that most closely corresponds to your position on the question. 



• In all of the following questions, the symbol c represents the speed of light in a vacuum, 3 x 10^ m/s. 



• Answer all of the questions to the best of your knowledge. 



o 




o 



guessing 



unconfident neutral confident 



certain 
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In the following two questions, Alice is standing in a train moving at velocity v from left to right relative to Bob, 
who is standing on a platform. As Alice passes Bob, she drops a bowling ball out of the train's window: 




1. Ignoring air resistance, which path of the ball would Bob observe, standing on the platform? 

(a) Path (a) 

(b) Path (b) 

(c) Path (c) 



Rate how confident you are in your answer: 

o o o o o 

guessing unconfident neutral confident certain 



2. Ignoring air resistance, which path of the ball would Alice observe, standing in the train? 

(a) Path (a) 

(b) Path (b) 

(c) Path (c) 



Rate how confident you are in your answer: 



o 



o 



guessing unconfident neutral confident certain 



3. True or false: "In principle, it is possible for an observer following a pulse of light at a constant high speed to 
observe the light to be almost stationary." 

(a) True 

(b) False 



Rate how confident you are in your answer: 

o o o o o 

guessing unconfident neutral confident certain 
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4. Consider a spaceship travelling from Earth towards a distant star at a constant high velocity v relative to Earth. 
The spaceship sends a light pulse back to Earth. On Earth, the speed of this pulse is measured to be: 

(a) c 

(b) c + v 

(c) c — f 



Rate how confident you are in your answer: 

o o o o o 

guessing unconfident neutral confident certain 



In the following two questions, Abbey is in a spaceship moving at high speed relative to Brendan, who is standing 
on an asteroid (a very small piece of rock floating in space). She flies past him so that at t = 0, she is momentarily 
adjacent to Brendan. 



5. At the instant that Abbey's ship passes Brendan, she sends two light pulses to him from her ship. If the light 
pulses are emitted a nanosecond (10~^ seconds) apart according to Abbey's clock, what will be the time interval 
between the pulses according to Brendan? 

(a) Greater than one nanosecond 

(b) Equal to one nanosecond 

(c) Less than one nanosecond 



Rate how confident you are in your answer: 

o o o o o 

guessing unconfident neutral confident certain 



6. Also while Abbey's ship passes Brendan, Brendan sends two light pulses to Abbey. If Brendan sends the light 
pulses a nanosecond (10~^ seconds) apart according to his clock, what will be the time interval between the 
pulses according to Abbey? 

(a) Greater than one nanosecond 

(b) Equal to one nanosecond 

(c) Less than one nanosecond 



Rate how confident you are in your answer: 

o o o o o 

guessing unconfident neutral confident certain 



7. It is known that our galaxy is of the order of 100,000 light-years in diameter. True or false: "Travelling at a 
constant speed that is less than, but close to, the speed of light, in principle it is possible for a person to cross 
the galaxy within their lifetime." 

(a) True 

(b) False 



Rate how confident you are in your answer: 

o o o o o 

guessing unconfident neutral confident certain 



15 



The Olympic Games is a two-week long sports competition. An interested alien astronomer watches the Olympics 
from a distant planet moving at high speed relative to Earth. // the alien were to compensate for the time the 
light from Earth takes to reach them, they would measure the length of the Olympics to be: 

(a) Greater than two weeks 

(b) Equal to two weeks 

(c) Less than two weeks 



Rate how confident you are in your answer: 

o o o o o 

guessing unconfident neutral confident certain 



In the following two questions, the scenario is as follows: Alex and his friend Bianca decide to set off on separate 
voyages in identical spaceships. They each speed away from Earth in opposite directions - Alex at i; = 0.75c to the 
left, and Bianca at u = 0.75c to the right, relative to an observer on Earth. 

9. If Alex measures the rate at which his distance to Bianca is increasing, he will obtain a value that is: 

(a) Equal to 1.5c 

(b) Greater than c but less than 1.5c 

(c) Equal to c 

(d) Greater than 0.75c but less than c 

(e) Equal to 0.75c 



Rate how confident you are in your answer: 

o o o o o 

guessing unconfident neutral confident certain 



10. If Cameron, an observer on Earth, measures the rate at which the distance between Alex and Bianca is increasing, 
he will obtain a value that is: 

(a) Equal to 1.5c 

(b) Greater than c but less than 1.5c 

(c) Equal to c 

(d) Greater than 0.75c but less than c 

(e) Equal to 0.75c 

Rate how confident you are in your answer: 

o o o o o 

guessing unconfident neutral confident certain 
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In the following four questions, Amanda is standing on a train travelling at high speed past Bryan, who is standing 
on a platform. As she passes Bryan, she drops two bowling balls out of the window at the same time (Amanda's 
time), and from an arm's span apart. 




11. Bryan stands on the platform and watches the balls fall to the ground. // he compensates for the time that the 
light from the impacts takes to reach him, in what order does Bryan measure the balls hitting the ground? 

(a) At the same time 

(b) One ball before the other 

Rate how confident you are in your answer: 

o o o o o 

guessing unconfident neutral confident certain 



12. Charlotte is another passenger on the train with Amanda. // she compensates for the time that the light from 
the impacts takes to reach her, in what order docs Charlotte measure the balls hitting the ground? 

(a) At the same time 

(b) One ball before the other 



Rate how confident you are in your answer: 

o o o o o 

guessing unconfident neutral confident certain 



13. Amanda has an arm span of D meters at rest. If Bryan performs a measurement of Amanda's arm span as she 
passes him, he will obtain a value: 

(a) Greater than D 

(b) Equal to D 

(c) Less than D 



Rate how confident you are in your answer: 

o o o o o 

guessing unconfident neutral confident certain 
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14. Amanda also has a height of H meters at rest. If Bryan performs a measurement of Amanda's height as she 
passes him, he wiU obtain a value: 

(a) Greater than H 

(b) Equal to H 

(c) Less than H 

Rate how confident you are in your answer: 

o o o o o 

guessing unconfident neutral confident certain 



15. Two separate light bulbs emit flashes of light, distant from an observer. This observer receives the light from 
both flashes at the same time. From this alone it is possible to conclude that: 

(a) The flashes occurred at the same time for all observers 

(b) The flashes occurred at the same time for the observer at that location 

(c) The flashes occurred at the same time if the observer is not moving relative to the light bulbs 

(d) It is not possible to make any of the above conclusions 

Rate how confident you are in your answer: 

o o o o o 

guessing unconfident neutral confident certain 



16. In the following thought experiment, you are in a high speed train travelling along a railway. True or false: "If 
you measure the dimensions of the train compartment, you will obtain different values than if the train were at 
rest." 

(a) True 

(b) False 

Rate how confident you are in your answer: 

o o o o o 

guessing unconfident neutral confident certain 



17. Consider a futuristic space station that specialises in constructing fast spaceships. Once the ships are built, 
they leave the station at high speed for testing. As they leave the station at speed, a serial number is stamped 
instantaneously on the side of the ship by a machine on the station. This serial number has length D as measured 
by a builder on the space station. After the ship has finished its test run, it returns to the station and is parked 
in the garage. What is the length of the serial number now, as measured by the builder on the space station? 

(a) Greater than D 

(b) Equal to D 

(c) Less than D 

Rate how confident you are in your answer: 

o o o o o 

guessing unconfident neutral confident certain 
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18. Adam is in a spaceship moving at d = 0.99c relative to our galaxy. Adam wants to measure the mass of his 
ship by observing how resistant the ship is to acceleration. If Adam exerts a force on the ship (by turning on a 
rocket engine, for example) and measures (with an accelerometer inside the ship) the acceleration that results, 
he will obtain a value that is: 



(a) Greater than what he would measure if his ship were at rest relative to the galaxy. 

(b) Equal to what he would measure if his ship were at rest relative to the galaxy. 

(c) Less than what he would measure if his ship were at rest relative to the galaxy. 



Rate how confident you are in your answer: 

o o o o o 

guessing unconfident neutral confident certain 



19. In the following thought experiment, you are in a high speed train travelling along a railway. True or false: "If 
you measure the rate at which your watch is ticking, you will obtain a different value than if the train were at 
rest." 



(a) True 

(b) False 



Rate how confident you are in your answer: 

o o o o o 

guessing unconfident neutral confident certain 



20. You are in a well equipped physics lab without windows or ways of interacting with the outside world. It is 
known that the lab is in uniform motion. How do you determine the velocity of the lab? 

(a) You throw a ball across the lab and measure its change in velocity 

(b) You shine a laser beam across the lab and measure its change in velocity 

(c) Either (a) or (b) 

(d) It is not possible to determine the lab's velocity by experiment 



Rate how confident you are in your answer: 

o o o o o 

guessing unconfident neutral confident certain 
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21. You observe a set of distant, spatially separated clocks that are synchronised in their rest frame. You are at 
rest relative to the clocks, and you observe (through a telescope) that the times read on the clocks are different. 
This is due to: 

(a) Time dilation 

(b) Length contraction 

(c) Relativity of simultaneity 

(d) None of the above 

Rate how confident you are in your answer: 

o o o o o 

guessing unconfident neutral confident certain 

22. If two events are separated in such a way that an observer can be present at both events, which relationship (s) 
between the two events are the same for all observers? 

(a) The time between the two events 

(b) The distance between the two events 

(c) The order in which the events occur 

(d) None of these relationships are the same for all observers 

Rate how confident you are in your answer: 

o o o o o 

guessing unconfident neutral confident certain 

23. If two events are separated in such a way that no observer can be present at both events, which relationship (s) 
between the two events are the same for all observers? 

(a) The time between the two events 

(b) The distance between the two events 

(c) The order in which the events occur 

(d) None of these relationships are the same for all observers 

Rate how confident you are in your answer: 

o o o o o 

guessing unconfident neutral confident certain 

24. Consider a closed box, containing an equal amount of matter and antimatter. The total mass of this box and 
its contents is initially M. The matter and antimatter are then allowed to annihilate inside the box, turning 
into photons in the process. What is the total mass of the box and its contents after the annihilation? 

(a) Greater than M 

(b) Equal to M 

(c) Less than M 



Rate how confident you are in your answer: 

o o o o o 

guessing unconfident neutral confident certain 



